Internet Info 1997 December

home *** CD-ROM | disk | FTP | other *** search

/ Internet Info 1997 December / Internet_Info_CD-ROM_Walnut_Creek_December_1997.iso / ietf / urn / urn-archives / urn-ietf.archive.9611 / 000015_owner-urn-ietf _Mon Nov 4 05:47:58 1996.msg < prev next >

Wrap

Internet Message Format | 1997-02-19 | 5KB

Received: (from daemon@localhost) by services.bunyip.com (8.6.10/8.6.9) id FAA27337 for urn-ietf-out; Mon, 4 Nov 1996 05:47:58 -0500 Received: from mocha.bunyip.com (mocha.Bunyip.Com [192.197.208.1]) by services.bunyip.com (8.6.10/8.6.9) with SMTP id FAA27325 for <urn-ietf@services.bunyip.com>; Mon, 4 Nov 1996 05:47:25 -0500 Received: from josef.ifi.unizh.ch by mocha.bunyip.com with SMTP (5.65a/IDA-1.4.2b/CC-Guru-2b) id AA21177 (mail destined for urn-ietf@services.bunyip.com); Mon, 4 Nov 96 05:47:22 -0500 Received: from ifi.unizh.ch by josef.ifi.unizh.ch id <00897-0@josef.ifi.unizh.ch>; Mon, 4 Nov 1996 11:46:39 +0100 Subject: Re: [URN] New syntax draft To: jon@net.lut.ac.uk Date: Mon, 4 Nov 1996 11:46:38 +0100 (MET) Cc: jayhawk@ds.internic.net, urn-ietf@bunyip.com In-Reply-To: <Pine.SUN.3.95.961101133545.17572f-100000@weeble.lut.ac.uk> from "Jon Knight" at Nov 1, 96 01:44:31 pm Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Content-Length: 3326 From: Martin J Duerst <mduerst@ifi.unizh.ch> Message-Id: <"josef.ifi..248:04.10.96.10.46.47"@ifi.unizh.ch> Sender: owner-urn-ietf@services.bunyip.com Precedence: bulk Reply-To: Martin J Duerst <mduerst@ifi.unizh.ch> Errors-To: owner-urn-ietf@bunyip.com Jon Knight wrote: >On Fri, 1 Nov 1996, Martin J Duerst wrote: >> When going from ASCII to UTF-8, there are some new problems. >> For ASCII, the general assumption is that only those charcters >> that need escaping are actually escaped, and therefore that >> escaping, e.g. for "/" or whatever, shouldn't be undone. >> The above is not exactly true, indeed the "~" is often escaped >> as %7E in Europe because it is difficult to type on European >> keyboards, but still I assume there are a lot of tools around >> working on URLs in general that work along the rule "if ASCII >> is escaped, don't remove the escaping, because this is a special >> character. > >Hmm, I don't really understand this; URL percent escaping is just away of >dealing with characters that you can't represent directly, either because >they're in the reserved set of characters or because they're not easy to >type (the French/German/Nordic/Icelandic letters in ISO-8859-1 for >example). If you're unescaping things, one would assume that you've >already parsed any reserved characters to extract any structuring >information that the provide. Yes. But now immagine that some of these French/German/Nordic/Icelandic letters, or whatever else, is reserved, but that somebody was not able to represent them directly, and used %-escaping. What are you going to do? Also, according to the syntax draft, intermediate software migth %-escape some UTF-8 part to pass the thing through a 7-bit channel. It will %-escape reserved and unreserved characters all alike. >Percent escaped characters aren't just >restricted to the reserved characters that need to be escaped; you can >percent escape your entire URL's path if you like. Any software that just >expects reserved characters to be percent escaped is going out of its way >to be braindead IMHO. The problem is with software that expects reserved charcters NOT to be escaped. >And anyway, URLs don't use UTF-8 so this isn't going to be a problem >(we're talking URNs here right?). I just explained the problems using URL examples. Just because they problems might not apply to URLs, that still does not help for URNs. And URLs don't use UTF-8 YET. They don't even use ASCII, in the sense we are proposing to use UTF-8 here. They just encode octets. For the benefit of us all, fortunately, there are some coincidences in most of the URL schemes, so that it actually looks as if URLs would use ASCII. If you currently see an URL with octets in the 8-bit range, there is not much of a chance to guess what characters it could represent. But steadily, the number of cases where there are chances that you get the characters if you interpret the octests as UTF-8 will be increasing. One example is ftp: The ftp-wg agreed on going into the direction of using UTF-8 as the common character encoding. According to the URL spec, that's up to ftp to decide. Because it is a very reasonable decision, it will hopefully be followed by others. The remaining formal difference between URL syntax and URN syntaxt would then be that the later officially allows the URN to be represented with characters/glyphs outside the ASCII range (on things such as cardboard boxes), but the former officially disallows such a behaviour. In practice, I guess this difference will vanish rather quickly. Regards, Martin.